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^ (54) Title: CANCER GENE DETERMINATION AND THERAPEUTIC SCREENING USING SIGNATURE GENE SETS 

^ (57) Abstract: Processes for assaying potential antitumor agents based on their modulation of the expression of specified genes, or 
sets, of suspected cancer cell genes are disclosed, along with methods for diagnosing cancerous, or potentially cancerous, conditions 
Q as a result of the expression, or patterns of expression, of such genes, or sets of genes. Also disclosed are methods for determining 
^ functionally related genes, or gene sets, as well as methods for treating cancer based on targeting expression products of such genes, 
^ or gene sets, and determining genes involved in the cancerous process. 
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FIELD OF THE INVENTION 

The present invention relates to methods of assaying potential anti- 
tumor agents based on their modulation of the expression of specified sets of 
5 genes and methods for diagnosing cancerous, or potentially cancerous, 
conditions as a result of the patterns of expression of such gene sets. 



! o BACKGROUND OF THE INVENTION 

Screening assays for novel drugs are based on the response of model 
cell based systems in vitro to treatment with specific compounds. Various 
measures of cellular response have been utilized, including the release of 
1 5 cytokines, alterations in cell surface markers, activation of specific enzymes, 
as well as alterations in ion flux and/or pH. Some such screens rely on 
specific genes, such as oncogenes (or gene mutations). 



20 BRIEF SUMMARY OF THE INVENTION 

In accordance with the present invention, there is provided 
characteristic sets of gene sequences whose expression, or non-expression, 
or change in expression, either an increase or decrease thereof, are indicative 

25 of the cancerous or non-cancerous status of a given cell. More particularly, 
such genes whose expression is changed in cancerous, as compared to non- 
cancerous cells, from a specific tissue (in particular, any of those disclosed 
herein) are genes that include one of the nucleotide sequences of SEQ ID 
NO: 1 - 8447, or sequences that are substantially identical to said sequences. 

30 Such a change in expression may be an increase or a decrease in expression 
or activity of the gene or gene sequences disclosed herein. 
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It is another object of the present invention to provide methods of using 
such characteristic, or signature, gene sets as a basis for assaying the 
potential ability of selected chemical agents to modulate upward or downward 
the expression of said characteristic, or signature, gene sets. 

5 

It is a further object of the present invention to provide methods of 
detecting the expression, or non-expression, or amount of expression, of said 
characteristic, or signature, gene sets, or portions thereof, as a means of 
determining the cancerous, or non-cancerous, status (or potential cancerous 
1 0 status) of selected cells as grown in culture or as maintained in situ. 

It is a still further object of the present invention to provide methods for 
treating cancerous conditions utilizing selected chemical agents as 
determined from their ability to modulate (i.e., increase or decrease) the 
1 5 selected characteristic, or signature, gene sets as disclosed herein, where 
said genes include, or comprise, one of the sequences of SEQ ID NO: 1 - 
8447, or sequences substantially identical to said sequences. 

In another aspect, the present invention relates to a process for 
20 determining a cancer initiating, facilitating or suppressing gene comprising the 
steps of contacting a cancerous cell with a cancer modulating agent and 
determining a change in expression of a gene selected from the group 
consisting of the gene sequences of SEQ ID NO: 1 - 8447 and thereby 
identifying said gene as being a cancer initiating or facilitating gene. Said 
25 genes may, for example, be oncogenes, cancer facilitating or promoting 
genes, or cancer suppressor genes. Said agents may increase or decrease 
gene expression. 

The present invention also relates to a process for treating cancer 
30 comprising contacting a cancerous cell with an agent having activity against 
an expression product encoded by a gene sequence selected from the group 
consisting of SEQ ID NO: 1 - 8447, which process may be conducted either 
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ex vivo or in vivo. Such agents may comprise an antibody or other molecule 
or portion that is specific for said expression product. 

In another aspect, the present invention further relates to process for 
5 determining a cancer initiating, facilitating or suppressing gene in a cancer cell 
comprising determining a change in expression of a gene sequence selected 
from the group consisting of the sequences of SEQ ID NO: 1 - 8447. Such 
change in expression is especially a change to a difference in copy number 
and said difference, either an increase or decrease thereof, and wherein said 
10 change is monitored or otherwise determined as a change in messenger RNA 
formation. Such change can be readily used to diagnose a cancerous 
condition, either in vivo or ex vivo. 

The present invention still further relates to a process for treating 
1 5 cancer comprising inserting into a cancerous cell a gene construct comprising 
an anti-cancer gene operabfy linked to a promoter or enhancer element such 
that expression of said anti-cancer gene causes suppression of said cancer 
and wherein said promoter or enhancer element is a promoter or enhancer 
element modulating a gene sequence selected from the group consisting of 
20 the sequences of SEQ ID NO: 1 - 8447, wherein said gene may be a cancer 
suppressor gene, or where said gene encodes a polypeptide having 
anticancer activity, such as one with apoptotic activity. 

In an additional aspect, the present invention relates to a process for 
25 determining functionally related genes comprising contacting one or more 
gene sequences selected from the group consisting of the sequences of SEQ 
ID NO: 1 - 8447 with an agent that modulates expression of more than one 
gene in such group and thereby determining a subset of genes of said group. 
Said functionally related genes include genes modulating the same metabolic 
30 pathway or encoding functionally related polypeptides or where said 
expression is modulated by the same transcription activator or enhancer 
sequence. 
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The present invention also relates to a method for producing a 
product comprising identifying an agent according to the process of claim 
1 wherein said product is the data collected with respect to said agent as 
5 a result of said process and wherein said data is sufficient to convey the 
chemical structure and/or properties of said agent. 



DETAILED SUMMARY OF THE INVENTION 

10 

The present invention relates to methods of assaying potential 
antitumor agents based on their modulation of the expression of specified sets 
of genes and methods for diagnosing cancerous, or potentially cancerous, 
conditions as a result of the patterns of expression of such gene sets and for 
15 determining cancer-inducing or regulating genes, and gene sets, based on 
common expression or regulation of such genes, or gene sets. 

In accordance with the present invention, model cellular systems using 
cell lines, primary cells, or tissue samples are maintained in growth medium 

20 and may be treated with compounds that may be at a single concentration or 
at a range of concentrations. At specific times after treatment, cellular RNAs 
are isolated from the treated cells, primary cells or tumors, which RNAs are 
indicative of expression of selected genes. The cellular RNA is then divided 
and subjected to analysis that detects the presence and/or quantity of specific 

25 RNA transcripts, which transcripts may then be amplified for detection 
purposes using standard methodologies, such as, for example, reverse 
transcriptase polymerase chain reaction (RT-PCR), etc. The presence or 
absence, or levels, of specific RNA transcripts are determined from these 
measurements and a metric derived for the type and degree of response of 

30 the sample to the treated compound compared to control samples. 
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Also in accordance with the present invention, there are disclosed 
herein characteristic, or signature, sets of genes and gene sequences whose 
expression is, or can be, as a result of the methods of the present invention, 
linked to, or used to characterize, the cancerous, or non-cancerous, status of 
5 the cells, or tissues, to be tested. Thus, the methods of the present invention 
identify novel anti-neoplastic agents based on their alteration of expression of 
small sets of characteristic, or indicator, or signature genes in specific model 
systems. The methods of the invention may therefore be used with a variety 
of cell lines or with primary samples from tumors maintained in vitro under 
1 0 suitable culture conditions for varying periods of time, or in situ in suitable 
animal models. 

More particularly, certain genes have been identified that are 
expressed at levels in cancer cells that are different than the expression levels 
15 in non-cancer cells. In one instance, the identified genes are expressed at 
higher levels in cancer cells than in normal cells. In another instance, the 
identified genes are expressed at lower levels in cancer cells as compared to 
normal cells. 

20 In accordance with the foregoing, the present invention relates to 

process for screening for an anti-neoplastic agent comprising the steps of: 

(a) exposing cells to a chemical agent to be tested for antineoplastic 
activity, and 

(b) determining a change in expression of at least one gene that 
25 includes one of the sequences of SEQ ID NOS: 1 - 8447, or a sequence that 

is at least 95% identical thereto, 

wherein a change in expression is indicative of anti-neoplastic activity. 

30 In particular embodiments, such change in expression may be an 

increase or a decrease in expression or activity. 
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More particularly, the present invention relates to a process for 
screening for an anti-neoplastic agent comprising the steps of: 

(a) exposing a known cancerous cell to a chemical agent to be tested 
for antineoplastic activity; 
5 (b) allowing said chemical agent to modulate the activity of one or more 

genes present in said cell wherein said genes include or comprise one of the 
sequences selected from the group consisting of the sequences of SEQ ID 
NO: 1 - 8447, sequences substantially identical to said sequences, or the 
complements of any of the foregoing; 
1 0 (c) determining the expression of one or more genes of step (b); 

(d) comparing the expression of said genes in the presence or absence 
of exposure to said chemical agent; 

wherein a difference in expression is indicative of the ability of anti- 
1 5 neoplastic activity. 

In specific embodiments of the present invention, said chemical agent 
to be tested modulates the expression of more than one said gene, especially 
where it modulates at least two said genes, more especially where at least 3, 
20 or at least 5 of said genes, or even 10 or more of said genes in said signature 
set, are modulated. In a preferred embodiment, all of said genes are 
modulated. 

In one embodiment of the present invention, said gene modulation is 
25 downward modulation, so that, as a result of exposure to the chemical agent 
to be tested, one or more genes of the cancerous cell will be expressed at a 
lower level (or not expressed at all) when exposed to the agent as compared 
to the expression when not exposed to the agent. 

30 In a preferred embodiment a selected set of said genes are expressed 

in the reference cell but not expressed in the cell to be tested as a result of 
the exposure of the cell to be tested to the chemical agent. Thus, where said 
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chemical agent causes the gene, or genes, of the tested cell to be expressed 
at a lower level than the same genes of the reference, this is indicative of 
downward modulation and indicates that the chemical agent to be tested has 
antineoplastic activity. 
5 In a separate embodiment, exposure of said cells to be tested to the 

chemical agent, especially one suspected of having anti-neoplastic activity, 
may result in upward modulation of said genes of the cell to be tested. Such 
upward modulation is interpreted as meaning that said genes are expressed 
where previously not expressed, or else are expressed in greater quantities 

1 0 when exposed to the agent as compared to non-exposure to the agent. Such 
upward modulation may be taken as indicative of anti-neoplastic activity by 
the tested chemical agent(s) of the gene, or genes, so modulated results in 
lower neoplastic activity on the part of such cells, such as where increased 
expression of the gene, or genes, results in decreased growth and/or 

1 5 increased differentiation of said cells away from the cancerous state. 

The genes useful in the assay processes include, respectively, as a 
part thereof at least one of the sequences selected from the group consisting 
of the sequences of SEQ ID NO: 1 - 8447, or sequences substantially 
20 identical thereto. Such sequences also include sequences complementary to 
any of the sequences disclosed herein. 

Genes including sequences at least 90% identical to a sequence 
selected from SEQ ID NO: 1 - 8447, preferably at least about 95% identical to 

25 such a sequence, more preferably at least about 98% identical to such 
sequence and most preferably 100% identical to such a sequence are 
specifically contemplated by all of the processes of the present invention. In 
addition, sequences encoding the same proteins as any of these sequences, 
regardless of the percent identity of such sequences, are also specifically 

30 contemplated by any of the methods of the present invention that rely on any 
or all of said sequences, regardless of how they are otherwise described or 
limited. Thus, any such sequences are available for use in carrying out any of 
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the methods disclosed according to the invention. Such sequences also 
include any open reading frames, as defined herein, present within any of the 
sequences of SEQ ID NO: 1 - 8447. 

5 Further in accordance with the present invention, the term "percent 

identity" or "percent identical," when referring to a sequence, means that a 
sequence is compared to a claimed or described sequence after alignment of 
the sequence to be compared (the "Compared Sequence") with the described or 
claimed sequence (the "Reference Sequence"). The Percent Identity is then 
1 0 determined according to the following formula: 

Percent Identity = 100[1-(C/R)] 



1 5 wherein C is the number of differences between the Reference Sequence and 
the Compared Sequence over the length of alignment between the Reference 
Sequence and the Compared Sequence wherein (i) each base or amino acid in 
the Reference Sequence that does not have a corresponding aligned base or 
amino acid in the Compared Sequence and (ii) each gap in the Reference 

20 Sequence and (iii) each aligned base or amino acid in the Reference Sequence 
that is different from an aligned base or amino acid in the Compared Sequence, 
constitutes a difference; and R is the number of bases or amino acids in the 
Reference Sequence over the length of the alignment with the Compared 
Sequence with any gap created in the Reference Sequence also being counted 

25 as a base or amino acid. 

If an alignment exists between the Compared Sequence and the 
Reference Sequence for which the percent identity as calculated above is about 
equal to or greater than a specified minimum Percent Identity then the 
30 Compared Sequence has the specified minimum percent identity to the 
Reference Sequence even though alignments may exist in which the 
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hereinabove calculated Percent Identity is less than the specified Percent 
Identity. 

As used herein, the terms "portion," "segment/ and "fragment; when 
5 used in relation to polypeptides, refer to a continuous sequence of residues, 
such as amino acid residues, which sequence forms a subset of a larger 
sequence. For example, if a polypeptide were subjected to treatment with any of 
the common endopeptidases, such as trypsin or chymotrypsin, the oligopeptides 
resulting from such treatment would represent portions, segments or fragments 
10 of the starting polypeptide. When used in relation to a polynucleotides, such 
terms refer to the products produced by treatment of said polynucleotides with 
any of the common endonucleases, or any stretch of polynucleotides that could 
be synthetically synthesized. 

1 5 As used herein and except as noted otherwise, all terms are defined as 

given below. 

In accordance with the present invention, the term "DNA segment" or 
"DNA sequence" refers to a DNA polymer, in the form of a separate fragment 

20 or as a component of a larger DNA construct, which has been derived from 
DNA isolated at least once in substantially pure form, i.e., free of 
contaminating endogenous materials and in a quantity or concentration 
enabling identification, manipulation, and recovery of the segment and its 
component nucleotide sequences by standard biochemical methods, for 

25 example, using a cloning vector. Such segments are provided in the form of 
an open reading frame uninterrupted by internal nontranslated sequences, or 
introns, which are typically present in eukaryotic genes. Sequences of non- 
translated DNA may be present downstream from the open reading frame, 
where the same do not interfere with manipulation or expression of the coding 

30 regions. 
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The term "coding region* refers to that portion of a gene which either 
naturally or normally codes for the expression product of that gene in its 
natural genomic environment, i.e., the region coding in vivo for the native 
expression product of the gene. The coding region can be from a normal, 
5 mutated or altered gene, or can even be from a DNA sequence, or gene, 
wholly synthesized in the laboratory using methods well known to those of 
skill in the art of DNA synthesis. 

In accordance with the present invention, the term "nucleotide 
1 0 sequence" refers to a heteropolymer of deoxyribonucleotides. Generally, DNA 
segments encoding the proteins provided by this invention are assembled 
from cDNA fragments and short oligonucleotide linkers, or from a series of 
oligonucleotides, to provide a synthetic gene which is capable of being 
expressed in a recombinant transcriptional unit comprising regulatory 
1 5 elements derived from a microbial or viral operon. 

The term "expression producf means that polypeptide or protein that is 
the natural translation product of the gene and any nucleic acid sequence 
coding equivalents resulting from genetic code degeneracy and thus coding 
20 for the same amino acid(s). 

The term "fragment," when referring to a coding sequence, means a 
portion of DNA comprising less than the complete coding region whose 
expression product retains essentially the same biological function or activity 
25 as the expression product of the complete coding region. 

The term "primer" means a short nucleic acid sequence that is paired 
with one strand of DNA and provides a free 3'-OH end at which a DNA 
polymerase starts synthesis of a deoxyribonucleotide chain. 

30 

The term "promoter" means a region of DNA involved in binding of RNA 
polymerase to initiate transcription. The term "enhancer" refers to a region of 
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DNA that, when present and active, has the effect of increasing expression of 
a different DNA sequence that is being expressed, thereby increasing the 
amount of expression product formed from said different DNA sequence. 

5 The term "open reading frame (ORF)" means a series of triplets coding 

for amino acids without any termination codons and is a sequence 
(potentially) translatable into protein. 

As used herein, reference to a DNA sequence includes both single 
1 0 stranded and double stranded DNA. Thus, the specific sequence, unless the 
context indicates otherwise, refers to the single strand DNA of such 
sequence, the duplex of such sequence with its complement (double stranded 
DNA) and the complement of such sequence. 

1 5 In carrying out the foregoing assays, relative antineoplastic activity may be 

ascertained by the extent to which a given chemical agent modulates the 
expression of genes present in a cancerous cell. Thus, a first chemical agent 
that modulates the expression of a gene associated with the cancerous state 
(i.e., a gene that includes one of the sequences disclosed herein and present in 

20 cancerous cells) to a larger degree than a second chemical agent tested by the 
assays of the invention is thereby deemed to have higher, or more desirable, or 
more advantageous, antineoplastic activity than said second chemical agent. 
Alternatively, where first and second chemical agents modulate expression of 
more than one of said genes, but where the second modulates expression of, 

25 for example, five said genes, whereas the first modulates expression of only 
three of said genes, especially where the three form a subset of the five, then 
the second chemical agent is deemed a more potent antineoplastic agent than 
the first. Such antineoplastic activity, as determined using the assays of the 
present invention, may necessarily include combinations of the foregoing 

30 possibilities, which are in no way to be considered limiting. 
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In utilizing these gene sequences for the assays according to the 
invention, the genes whose activity is to be determined with and without the 
presence of the compound to be evaluated for antitumor activity may be any 
one, or several, or any combination of the gene sequences disclosed herein as 
5 SEQ ID NO: 1 - 8447. However, how the gene sequences are employed in 
such assays depends on the pattern of gene expression disclosed for the 
signature sets for the different organs and tissues. For example, a sequence 
that is expressed in cancerous cells but not in normal cells will identify a a 
potential anticancer agent by that agenfs ability to decrease expression of the 

1 0 sequence, or sequences, in tumor cells. Conversely, a sequence, or sequences, 
expressed in normal but not tumor cells will identify a potential antitumor agent 
by its ability to increase expression in the tumor cells. The same relationship 
holds true where the sequences are expressed in both cancer and normal cells 
but are expressed at a higher level in one than in the other, and vice versa. 

15 Based on the expression patterns disclosed for the gene sequences and 
signature sets disclosed herein, it should be readily apparent to those skilled in 
the art how to conduct assays for potential antitumor agents using the signature 
gene sets. The same holds true where the sequences, or signature gene sets, 
are utilized to determine the cancerous state of a cell or use of an agent to treat 

20 a cancerous condition. 

Thus, in one aspect, the present invention relates to a process for screening 
for an antineoplastic agent comprising the steps of (a) exposing cells to a 
chemical agent to be tested for antineoplastic activity, and (b) determining a 

25 change in expression of at least one gene of a signature gene set, or a 
sequence that is at least 95% identical thereto, wherein a change in 
expression is indicative of antineoplastic activity. Such change in expression 
is intended to mean a change include any activity of the gene, and may be an 
increase or decrease thereof. In addition, such change in activity may be a 

30 change in expression or other activity of at least 1 such gene, such as 5 or 10, 
or more of the genes of a signature set, even as many as half of such genes 
or even of all of the genes of a particular gene set. 
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The gene expression to be measured is commonly assayed using RNA 
expression as an indicator. Thus, the greater the level of RNA (messenger RNA) 
detected the higher the level of expression of the corresponding gene. Thus, 
gene expression, either absolute or relative, such as where the expression of 
5 several different genes are being quantitatively evaluated and compared, for 
example, where chemical agents modulate the expression of more than one 
gene, such as a set of 3, 4, 5, or more genes, is determined by the relative 
expression of the RNAs encoded by such genes. 

1 0 RNA may be isolated from samples in a variety of ways, including lysis 

and denaturation with a phenolic solution containing a chaotropic agent (e.g., 
triazol) followed by isopropanol precipitation, ethanol wash, and resuspension in 
aqueous solution; or lysis and denaturation followed by isolation on solid 
support, such as a Qiagen resin and reconstitution in aqueous solution; or lysis 

15 and denaturation in non-phenolic, aqueous solutions followed by enzymatic 
conversion of RNA to DNA template copies. 

Normally, prior to applying the processes of the invention, steady state 
RNA expression levels for the genes, and sets of genes, disclosed herein will 

20 have been obtained. It is the steady state level of such expression that is 
affected by potential anti-neoplastic agents as determined herein. Such steady 
state levels of expression are easily determined by any methods that are 
sensitive, specific and accurate. Such methods include, but are in no way limited 
to, real time quantitative polymerase chain reaction (PCR), for example, using a 

25 Perkin-Elmer 7700 sequence detection system with gene specific primer probe 
combinations as designed using any of several commercially available software 
packages, such as Primer Express software., solid support based hybridization 
airay technology using appropriate internal controls for quantitation, including 
filter, bead, or microchip based arrays, solid support based hybridization anays 

30 using, for example, chemiluminescent, fluorescent, or electrochemical reaction 
based detection systems. 
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In one embodiment of the present invention, a set of genes useful in 
evaluating, or screening, or otherwise assaying, one or more chemical agents 
for anti-neoplastic activity in the assays disclosed herein will have already been 
shown to have differences in the ratios of steady state RNA levels in cancer 
5 cells , or tissues, relative to normal, or non-tumorous cells or tissues, or will have 
exhibited differences in the expression ratios in tumor samples compared to 
normal samples between genes in a given subset of the set of genes disclosed 
herein, or will have gene expression that has increased from undetectable levels 
to detectable levels, or vice versa, as the case may be, especially where 
10 sensitive detection methods are employed, or conversely will have decreased 
from detectable levels to undetectable levels with such procedures, especially 
sensitive procedures. 

The genes, and gene sequences, useful in practicing the methods of 
1 5 the present invention are genes that are found to be selectively expressed in, 
or not expressed in, cancer cells as compared to non-cancer cells, or in which 
expression is down-regulated or up-regulated, as the case may be, in 
cancerous cells as compared to non-cancerous cells. Thus, these may 
include genes, or sets of genes, expressed in cancer cells but absent from, or 
. 20 inactive in, non-cancerous cells, or may include genes, or sets of genes, 
expressed in non-cancerous cells, but not expressed in cancer cells. 
Alternatively, the genes useful in practicing the present invention may be 
more expressed, or less expressed, in a cancerous cell relative to a non- 
cancerous cell. Such genes are generally those comprising the sequences of 
25 SEQ ID NO: 1 - 8447. 

All of these sequences are provided herewith on CD-ROM only. 

In accordance with the foregoing, the present invention further relates 
30 to a process for determining the cancerous status of a test cell, comprising 
determining expression in said test cell of at least one gene that includes one 
of the nucleotide sequences selected from the sequences of SEQ ID NO: 1 - 
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8447, or a nucleotide sequence that is at least 95% identical thereto, and then 
comparing said expression to expression of said at least one gene in at least 
one cell known to be non-cancerous whereby a difference in said expression 
indicates that said cell is cancerous. 

5 

In a particular embodiment, the present invention is directed to a 
process for determining the cancerous status of a cell to be tested, comprising 
determining the presence in said cell of at least one gene that includes one of 
the nucleotide sequences selected from the sequences of SEQ ID NO: 1 - 
10 8447, including sequences having substantial identity homologous to said 
sequences, or characteristic fragments thereof, or the complements of any of 
the foregoing and then comparing the pattern of said gene presence and/or 
absence with that found for a cell known, or believed, to be non-cancerous, or 
normal, at least with respect to its genetic complement 

15 

With respect to genes that include at least one of the sequences of 
SEQ ID NO: 1 - 8447, up regulation of expression in cancer cells (as 
compared to non-cancer cells, which may lack said genes, or said gene 
expression, altogether) is indicative of a cancerous, or potentially cancerous, 
20 condition. 

In specific embodiments, the present invention relates to embodiments 
wherein the genetic pattern is the modulation of expression of more than one 
gene, preferably 3, 4, or 5 genes, and even includes patterns where there is a 

25 modulation of expression of as many as 10, or more, genes. Thus, where a 
genetic pattern is the modulation of expression of 5 genes in a cancerous cell 
as compared to a non-cancerous cell from the same tissue type, such as a 
cancerous colon cell or other cancerous tissue cell, such as any of the organs 
or tissues described herein as related to specifically recited SEQ ID NOs., 

30 versus a non-cancerous cell of the same tissue or organ, such a pattern 
indicates a likelihood that such genes (i.e., the modulation of expression of 
those 5 genes) is an indicator of cancerous status and thereby provides a 
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means of diagnosing a cancerous, or potentially cancerous, status. The 
absence of a specific set of genes from cancerous cells where said genes are 
present in otherwise normal cells, especially those of a similar type, is also 
indicative of a correlation with the cancerous state and thus can likewise be 
5 used as a means of diagnosing the cancerous state in other cells suspected 
of being cancerous. 

For example, with respect to colon, especially colon adenocarcinoma, 
this would include SEQ ID NO: 1-333 (expressed in normal colon cells but not 

10 in colon cancer cells), SEQ ID NO: 334-522 (expressed at elevated levels in 
colon adenocarcinoma but not expressed in normal colon cells), SEQ ID NO: 
523-837 (expressed at reduced levels, more than 2.09 fold, in colon 
adenocarcinoma but not in normal colon cells) and SEQ ID NO: 838-1067 
(expressed at elevated (at least 2.1 fold) levels in colon adenocarcinoma but 

1 5 not elevated in normal colon cells). Thus, for colon the above groupings of 
sequences represent four signature gene sets for colon. For example, SEQ ID 
NO: 334-522 would represent a signature set or signature gene set for colon. 
The same is true for each of the organs and tissues listed below with their 
respective signature sets or signature gene sets. 

20 

In the same way as for colon, other gene sequences are indicative of 
the cancerous or normal state of other organs and tissues. Thus, as disclosed 
herein, these would include SEQ ID NO: 1068-2459 for breast, wherein SEQ 
ID NO: 1068-1255 represent genes expressed in infiltrating ductal carcinoma 

25 of the breast that are not expressed at detectable levels in normal breast, 
wherein SEQ ID NO: 1256-1459 represent genes expressed in breast 
carcinoma that are not expressed at detectable levels in normal breast, 
wherein SEQ ID NO: 1459-1664 represent genes expressed in infiltrating 
lobular carcinoma of the breast that are not expressed at detectable levels in 

30 normal breast, wherein SEQ ID NO: 1665-2067 represent genes expressed in 
normal breast that are absent or not expressed in infiltrating ductal carcinoma 
of the breast, and wherein SEQ ID NO: 2068-2459 represent genes 
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expressed in normal breast cells but absent or not expressed in infiltrating 
lobular carcinoma of the breast that are not expressed at detectable levels in 
normal breast. 

5 This further includes SEQ ID NO: 2460-3027 for stomach, wherein 

SEQ ID NO: 2460-2773 represent genes or gene sequences expressed in 
stomach cancer that are not expressed at detectable levels in normal stomach 
cells, and wherein SEQ ID NO: 2774-3027 represent genes or gene 
sequences expressed in normal stomach cells cancer that are not expressed 
1 0 at detectable levels in stomach cancer cells. 

This further includes SEQ ID NO:3028-5303 for lung, wherein SEQ ID 
NO: 3028-3119 represent genes or gene sequences expressed in lung 
adenocarcinoma that are not expressed at appreciable levels in normal lung 

15 cells, wherein SEQ ID NO: 3120-3322 represent genes or gene sequences 
expressed in normal lung cells that are not expressed at appreciable levels in 
lung adenocarcinoma, wherein SEQ ID NO: 3323-3570 represent genes or 
gene sequences expressed in non-cancerous lung tissue that are not 
expressed at appreciable levels in malignant lung samples, wherein SEQ ID 

20 NO: 3571-3777 represent genes or gene sequences expressed in malignant 
lung samples that are not expressed at appreciable levels in non-malignant 
lung cells, wherein SEQ ID NO: 3778-3836 represent genes or gene 
sequences expressed in both normal and malignant lung adenocarcinoma but 
are up-regulated by at least about 2 fold in lung adenocarcinoma, wherein 

25 SEQ ID NO: 3837-3980 represent genes or gene sequences expressed at 
appreciable levels in normal lung samples but are not typically expressed in 
lung squamous cell carcinoma, wherein SEQ ID NO: 3981-4215 represent 
genes or gene sequences expressed in normal lung tissue but not ordinarily 
expressed in neuroendocrine carcinoma of the lung, wherein SEQ ID NO: 

30 4216-4634 represent genes or gene sequences expressed at appreciable 
levels in lung neuroendocrine carcinoma that are not expressed at detectable 
levels in normal lung, wherein SEQ ID NO: 4635-4877 represent genes or 
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gene sequences expressed in lung squamous cell carcinoma that are not 
expressed at detectable levels in normal lung, and wherein SEQ ID NO: 4878- 
5303 represent genes or gene sequences expressed in normal lung and lung 
adenocarcinoma but are down-regulated or under-expressed in lung 
5 adenocarcinoma relative to normal lung tissues. 

This further includes SEQ ID NO: 5304-5886 for thyroid, wherein SEQ 
ID NO: 5304-5408 represent genes or gene sequences expressed in thyroid 
papillary carcinoma that are not found in normal thyroid tissue, wherein SEQ 
10 ID NO: 5409-5602 represent genes or gene sequences expressed in normal 
thyroid cells that are not expressed in thyroid papillary carcinoma and wherein 
SEQ ID NO: 5603-5886 represent genes or gene sequences expressed at a 
level at least about a 5 fold higher level in thyroid papillary carcinoma relative 
to normal thyroid cells. 

15 

This further includes SEQ ID NO: 5887-6147 for esophagus, wherein 
SEQ ID NO: 5887-6015 represent genes or gene sequences expressed in 
esophagus adenocarcinoma but not in normal esophagus from the same 
patients and wherein SEQ ID NO: 6016-6147 represent genes or gene 
20 sequences expressed in normal esophagus but not in esophagus 
adenocarcinoma samples from the same patients. 

This further includes SEQ ID NO: 6148-6472 for ovary, wherein SEQ 
ID NO: 6148-6371 represent genes or gene sequences expressed only in 
25 malignant ovarian carcinomas, wherein SEQ ID NO: 6372-6424 represent 
genes or gene sequences expressed only in normal ovarian tissues and 
wherein SEQ ID NO: 6425-6472 represent genes or gene sequences 
expressed only in metastatic ovarian cancer. 

30 This further includes SEQ ID NO: 6473-7473 for kidney, wherein SEQ 

ID NO:6473-6615 represent genes or gene sequences expressed in normal 
kidney but not in clear cell carcinoma of the kidney, wherein SEQ ID NO: 
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6616-6685 represent genes or gene sequences expressed in clear cell 
carcinoma cells but not in normal kidney cells, wherein SEQ ID NO: 6686- 
6973 represent genes or gene sequences expressed in normal kidney cells 
but not in renal cell carcinoma of the kidney, wherein SEQ ID NO: 6974-7156 
5 represent genes or gene sequences expressed in renal cell carcinoma but not 
in normal kidney, wherein SEQ ID NO: 7157-7229 represent genes or gene 
sequences expressed in normal kidney but not in Wilm's tumor cells, and 
wherein SEQ ID NO: 7230-7473 represent genes or gene sequences 
expressed in Wilm's tumor but not in normal kidney cells. 

10 

This further includes SEQ ID NO: 7474-8131 for prostate, wherein SEQ 
ID NO: 7475-7833 represent genes or gene sequences expressed in prostate 
adenocarcinoma but not appreciably expressed in normal prostate cells, 
wherein SEQ ID NO: 7834-8071 represent genes or gene sequences 
1 5 expressed in normal prostate cells but not expressed at appreciable levels in 
prostate adenocarcinoma and wherein SEQ ID NO: 8072-8131 represent 
genes or gene sequences for ribosomal proteins that are highly expressed in 
prostate adenocarcinoma but are not expressed at appreciable levels in 
normal prostate cells. 

20 

This further includes SEQ ID NO: 8132-8447 for pancreas, wherein 
SEQ ID NO: 8132-8358 represent genes or gene sequences expressed in 
normal pancreas but not in pancreas adenocarcinoma and wherein SEQ ID 
NO: 8359-8447 represent genes or gene sequences expressed in pancreas 
25 adenocarcinoma but not in normal pancreas. 

The gene patterns indicative of a cancerous state need not be 
characteristic of every cell found to be cancerous. Thus, the methods 
disclosed herein are useful for detecting the presence of a cancerous 
30 condition within a tissue where less than all cells exhibit the complete pattern. 
For example, a set of selected genes, comprising sequences homologous 
under stringent conditions, or at least 90%, preferably 95%, identical to at 
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least one of the sequences of SEQ ID NO: 1 - 8447, and wherein the 
signature set is comprised of genes expressed and/or up-regulated in cancer 
cells relative to normal cells, as disclosed above for the signature gene sets 
used for practicing the invention, may be found, using appropriate probes, 
5 either DNA or RNA, to be present in as little as 60% of cells derived from a 
sample of tumorous, or malignant, tissue while being absent from as much as 
60% of cells derived from corresponding non-cancerous, or otherwise normal, 
tissue (and thus being present in as much as 40% of such normal tissue 
cells). In a preferred embodiment, such gene pattern is found to be present in 

1 0 at least 70% of cells drawn from a cancerous tissue and absent from at least 
70% of a corresponding normal, non-cancerous, tissue sample. In an 
especially preferred embodiment, such gene pattern is found to be present in 
at least 80% of cells drawn from a cancerous tissue and absent from at least 
80% of a corresponding normal, non-cancerous, tissue sample. In a most 

1 5 preferred embodiment, such gene pattern is found to be present in at least 
90% of cells drawn from a cancerous tissue and absent from at least 90% of a 
corresponding normal, non-cancerous, tissue sample. In an additional 
embodiment, such gene pattern is found to be present in at least 100% of 
cells drawn from a cancerous tissue and absent from at least 100% of a 

20 corresponding normal, non-cancerous, tissue sample, although the latter 
embodiment may represent a rare occurrence. 

Conversely, where the signature set is expressed or up-regulated in 
normal cells versus cancerous cells, as disclosed herein, expression in the 
25 normal cells but not in suspected cancerous cells may confirm a cancerous 
state in the suspected cancerous sample. The same is true for assays 
disclosed herein for potential antitumor agents. 

Although the presence or absence of expression of one or more 
30 selected gene sequences may be indicative of a cancerous status for a given 
cell, the mere presence or absence of such a gene pattern may not alone be 
sufficient to achieve a malignant condition and thus the level of expression of 
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such gene pattern may also be a significant factor in determining the 
attainment of a cancerous state. Thus, while a pattern of genes may be 
present in both cancerous and non-cancerous cells, the level of expression, 
and determined by any of the methods disclosed herein, all of which are well 
5 known in the art, may differ between the cancerous versus the non-cancerous 
cells. Thus, it becomes essential to also determine the level of expression of 
one or more of said genes as a separate means of diagnosing the presence 
of a cancerous status for a given cell, groups of cells, or tissues, either in 
culture or in situ. 

10 

In accordance with the invention disclosed herein, a determination of 
an anticancer agent using the signature gene sets for the various organs and 
tissues described above is based on patterns of modulation of such genes so 
that increase or decrease in expression of a gene due to the presence of such 
1 5 a potential agent may or may not be meaningful. Thus, the more genes in a 
gene set that are affected by said agent the more likely said agent is an 
effective therapeutic agent. 

In addition, different agents may have different abilities to affect the 
20 genes of a signature gene set. For example, if a potential therapeutic agent, 
say, agent A, causes a gene or group of genes of a characteristic or signature 
gene set, or even all of the genes of said gene set, to exhibit decreased 
expression, such as where a lower amount of mRNA is expressed from said 
gene(s), or less protein is produced from said mRNA, but a second potential 
25 agent, say, agent B, while modulating the activity of the same or related 
genes causes said expression to be reduced to half, such as where only half 
as much mRNA is transcribed or only half as much protein is translated from 
said mRNA as for agent A, then agent B is considered to have twice as much 
therapeutic potential as agent A. 

30 

Such modulation or change of activity as determined using the assays 
disclosed herein may include either an increase or a decrease in activity of 
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said genes or gene sequences. Thus, where a gene is expressed in cancer 
cells but not in normal cells, or is up-regulated in cancer cells relative to 
normal cells, of the same organ or tissue type, an agent that down-regulates 
said gene or genes, or gene sequences, or prevents their expression entirely, 
5 is considered a potential antitumor agent within the present disclosure. 
Conversely, where an agent causes expression of a gene or genes, or gene 
sequences, expressed in normal cells but not in cancer cells, or where said 
agent up-regulates a gene or genes, or gene sequences, that are expressed 
in normal cells but not in cancer cells, or are up-regulated in normal cells but 
1 0 not in cancer cells, of the same organ or tissue type, said agent is considered 
to be a potential antitumor agent within the present disclosure. 

The present invention also relates to a method for producing a 
product comprising identifying an agent according to one of the disclosed 
1 5 processes for identifying such an agent wherein said product is the data 
collected with respect to said agent as a result of said identification 
process and wherein said data is sufficient to convey the chemical 
character and/or structure and/or properties of said agent. 

20 In accordance with the foregoing, the present invention further relates 

to a process for determining the cancerous status of a cell to be tested, 
comprising determining the level of expression in said cell of at least one gene 
that includes one of the nucleotide sequences selected from the sequences of 
SEQ ID NO: 1 - 8447, including sequences substantially identical to said 

25 sequences, or characteristic fragments thereof, or the complements of any of 
the foregoing and then comparing said expression to that of a cell known to 
be non-cancerous whereby the difference in said expression indicates that 
said cell to be tested is cancerous. 

30 In specific embodiments of the present invention, said expression is 

determined for more than one of said genes, such as 2, 3, 4, 5, or more such 
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genes, considered as a set, and even as many as a set of 10 such genes. A 
set of genes, for example, 5 such genes, may be found to be expressed at 
certain levels in cancer cells but are found to be expressed at lower levels (or 
not expressed at all) in non-cancerous, or normal, cells. Conversely, a set of, 
5 for example, 5 such genes may be found to be expressed in normal (i.e., non- 
cancerous) cells but expressed at lower levels (or not expressed at all) in 
cancer cells. Thus, by determining the set or pattern of genes expressed in 
cancer cells but expressed at lower levels (or not at all) in non-cancer, or vice 
versa, a method is achieved for diagnosing cancerous conditions wherein said 
10 genes are selected from those that include one of the sequences, or 
fragments of sequences, including complementary sequences, selected from 
SEQ ID NO: 1-8447. 

In specific embodiments, the processes of the present invention 
1 5 include embodiments wherein said expression is the expression of more than 
one said gene, possibly at least three said genes, preferably at least 5 said 
genes, or even as many as 10 or more said genes. 

In other embodiments, the process of the invention relates to situations 
20 wherein expression of said genes is higher in said cells to be tested than in 
said non-cancerous cells, or wherein expression of said genes is lower in said 
cells to be tested than in said non-cancerous cells. 

Using the methods disclosed herein, the cancerous, or non-cancerous, 
25 status of a cell, or tissue sample, may be readily ascertained. For example, 
colon or other cancers can be readily detected using the methods of the 
present invention. 

In accordance with the invention, although gene expression for a gene 
30 that includes as a portion thereof one of the nucleotide sequences of SEQ ID 
NO: 1 - 8447, is preferably determined by use of a probe that is a fragment of 
such nucleotide sequence, it is to be understood that the probe may be 
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formed from a different portion of the gene. Thus, for each gene of the 
signature sets of the present invention, the nucleotide sequence disclosed 
with respect to a specific sequence ID number is only a portion of the 
nucleotide sequence that encodes expression of the gene. As a result, 
5 expression of the gene may be determined by use of a nucleotide probe that 
hybridizes to messenger RNA (mRNA) transcribed from a portion of the gene 
other than the specific nucleotide sequence disclosed with reference to a 
sequence ID number as recited herein. 

1 0 The present invention further relates to a process for determining a 

cancer initiating, facilitating or suppressing gene comprising the steps of 
contacting a cell with a cancer modulating agent and determining a change in 
expression of a gene selected from the group consisting of the gene 
sequences of SEQ ID NO: 1 - 8447 and thereby identifying said gene as 

1 5 being a cancer initiating or facilitating gene. 

Thus, some or all of the genes within the signature gene sets disclosed 
herein as SEQ ID NO: 1 - 8447 are found to play a direct role in the initiation 
or progression of cancer or even other diseases and disease processes. 

20 Because changes in expression of these genes (either up-regulation or down- 
regulation) are linked to the disease state (i.e. cancer), the change in 
expression may contribute to the initiation or progression of the disease. For 
example, if a gene that is up-regulated is an oncogene, or if a gene that is 
down-regulated is a tumor suppressors, such a gene provides for a means of 

25 screening for small molecule therapeutics beyond screens based upon 
expression output alone. For example, genes that display up-regulation in 
cancer and whose elevated expression contributes to initiation or progression 
of disease represent targets in screens for small molecules that inhibit or 
block their function. Examples include, but are not be limited to, kinase 

30 inhibition, cellular proliferation, substrate analogs that block the active site of 
protein targets, etc. Similarly, genes that display down-regulation in cancer 
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and whose absence results in initiation or progression of disease are valuable 
therapeutics for gene therapy. 

It should be noted that there are a variety of different contexts in which 
5 genes have been evaluated as being involved in the cancerous process. 
Thus, some genes may be oncogenes and encode proteins that are directly 
involved in the cancerous process and thereby promote the occurrence of 
cancer in an animal. In addition, other genes may serve to suppress the 
cancerous state in a given cell or cell type and thereby work against a 

10 cancerous condition forming in an animal. Other genes may simply be 
involved either directly or indirectly in the cancerous process or condition and 
may serve in an ancillary capacity with respect to the cancerous state. All 
such types of genes are deemed with those to be determined in accordance 
with the invention as disclosed herein. Thus, the gene determined by said 

1 5 process of the invention may be an oncogene, or the gene determined by said 
process may be a cancer facilitating gene, the latter including a gene that 
directly or indirectly affects the cancerous process, either in the promotion of a 
cancerous condition or in facilitating the progress of cancerous growth or 
otherwise modulating the growth of cancer cells, either in vivo or ex vivo. In 

20 addition, the gene determined by said process may be a cancer suppressor 
gene, which gene works either directly or indirectly to suppress the initiation or 
progress of a cancerous condition. Such genes may work indirectly where 
their expression alters the activity of some other gene or gene expression 
product that is itself directly involved in initiating or facilitating the progress of 

25 a cancerous condition. For example, a gene that encodes a polypeptide, 
either wild or mutant in type, which polypeptide acts to suppress of tumor 
suppressor gene, or its expression product, will thereby act indirectly to 
promote tumor growth. 

30 In accordance with the foregoing, the process of the present invention 

includes cancer modulating agents that are themselves either polypeptides, or 
small chemical entities, that affect the cancerous process, including initiation, 
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suppression or facilitation of tumor growth, either in vivo or ex vivo. Said 
cancer modulating agent may have the effect of increasing gene expression 
or said cancer modulating agent may have the effect of decreasing gene 
expression as such terms have been described herein. 

5 

In keeping with the disclosure herein, the present invention also relates 
to a process for treating cancer comprising contacting a cancerous cell with 
an agent having activity against an expression product encoded by a gene 
sequence selected from the group consisting of SEQ ID NO: 1 - 8447. 

10 

Thus, some or all of the genes within these signature gene sets 
represent individual targets for therapeutic intervention, based at least in part 
on their pattem(s) of expression. For example, genes within the signature 
gene sets that encode cell surface molecules and are up-regulated in cancer 

1 5 as compared to normal cells. The proteins encoded by such genes, due to 
their elevated expression in cancer cells, represent highly useful therapeutic 
targets for "targeted therapies" utilizing such affinity structures as, for 
example, antibodies coupled to some cytotoxic agent. In such methodology, it 
is advantageous that nothing need be known about the endogenous ligands 

20 or binding partners for such cell surface molecules. Rather, an antibody or 
equivalent molecule that can specifically recognize the cell surface molecule 
(which could include an artificial peptide, a surrogate ligand, and the like) that 
is coupled to some agent that can induce cell death or a block in cell cycling 
offers therapeutic promise against these proteins. Thus, such approaches 

25 include the use of so-called suicide "bullets" against intracellular proteins 

The process of the present invention includes embodiments of the 
above-recited process wherein said cancer cell is contacted in vivo as well as 
ex vivo, preferably wherein said agent comprises a portion, or is part of an 
30 overall molecular structure, having affinity for said expression product. In one 
such embodiment, said portion having affinity for said expression product is 
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an antibody, especially where said expression product is a polypeptide or 
oligopeptide or comprises an oligopeptide portion, or comprises a polypeptide. 

Such an agent can therefore be a single molecular structure, 
5 comprising both affinity portion and anti-cancer activity portions, wherein said 
portions are derived from separate molecules, or molecular structures, 
possessing such activity when separated and wherein such agent has been 
formed by combining said portions into one larger molecular structure, such 
as where said portions are combined into the form of an adduct. Said anti- 

1 0 cancer and affinity portions may be joined covalently, such as in the form of a 
single polypeptide, or polypeptide-like, structure or may be joined non- 
covalently, such as by hydrophobic or electrostatic interactions, such 
structures having been formed by means well known in the chemical arts. 
Alternatively, the anti-cancer and affinity portions may be formed from 

1 5 separate domains of a single molecule that exhibits, as part of the same 
chemical structure, more than one activity wherein one of the activities is 
against cancer cells, or tumor formation or growth, and the other activity is 
affinity for an expression product produced by expression of genes related to 
the cancerous process or condition. 

20 

In one embodiment of the present invention, a chemical agent, such as 
a protein or other polypeptide, is joined to an agent, such as an antibody, 
having affinity for an expression product of a cancerous cell, such as a 
polypeptide or protein encoded by a gene related to the cancerous process, 

25 especially a gene sequence selected from the group consisting of the 
sequences of SEQ ID NO: 1 - 8447. In a specific embodiment, said 
expression product is a cell surface receptor, such as a protein or glycoprotein 
or lipoprotein, present on the surface of a cancer cell, such as where it is part 
of the plasma membrane of said cancer cell, and acts as a therapeutic target 

30 for the affinity portion of said anticancer agent and where, after binding of the 
affinity portion of such agent to the expression product, the anti-cancer portion 
of said agent acts against said expression product so as to neutralize its 
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effects in initiating, facilitating or promoting tumor formation and/or growth. In 
a separate embodiment of the present invention, binding of the agent to said 
expression product may, without more, have the effect of deterring cancer 
promotion, facilitation or growth, especially where the presence of said 
5 expression product is related, either intimately or only in an ancillary manner, 
to the development and growth of a tumor. Thus, where the presence of said 
expression product is essential to tumor initiation and/or growth, binding of 
said agent to said expression product will have the effect of negating said 
tumor promoting activity. In one such embodiment, said agent is an apoptosis- 
1 0 inducing agent that induces cell suicide, thereby killing the cancer cell and 
halting tumor growth.. 

As disclosed herein, the present invention further relates to a process 
for determining a cancer initiating, facilitating or suppressing gene in a cancer 
15 cell comprising determining a change in expression of a gene sequence, 
especially where said sequence is one selected from the group consisting of 
the sequences of SEQ ID NO: 1 - 8447. 

Thus, the processes of the present invention take advantage of the 
20 correlation of changes in mRNA expression profiles of these signature gene 
sets with potential (depending on the form of cancer) changes in DNA copy 
number of the chromosomal regions wherein these genes are located. Of 
course, the precise nature of the change in mRNA expression (e.g. a 
signature set of genes that are up-regulated at the transcriptional level) may 
25 also indicate a change in the DNA copy number for the genomic regions in 
which these genes are located (e.g. an amplification of the genomic DNA 
region that contains the involved gene or genes). 

All cancers contain chromosomal rearrangements, which typically 
30 represent translocations, amplifications, or deletions of specific regions of 
genomic DNA. A recurrent chromosomal rearrangement that is associated 
with a specific stage and type of cancer always affects a gene (or possibly 
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genes) that play a direct and critical role in the initiation or progression of the 
disease. Many of the known oncogenes or tumor suppressor genes that play 
direct roles in cancer have either been initially identified based upon their 
positional cloning from a recurrent chromosomal rearrangement or have been 
5 demonstrated to fall within a rearrangement subsequent to their cloning by 
other methods. In all cases, such genes display amplification at both the level 
of DNA copy number and at the level of transcriptional expression at the 
mRNA level. 

1 0 At least some of the genes that are contained within signature gene 

sets disclosed herein (SEQ ID NO: 1 - 8447) display changes in their mRNA 
expression profiles (depending on the precise reading frame involved) within 
cancer samples due, in part, to changes in their DNA copy number as a result 
of specific chromosomal rearrangements in those cancer cells. The utilities 

1 5 that follow from this are (i) that the genes contained within these signature 
gene sets offer a time saving shortcut to the identification of novel 
chromosomal rearrangements, amplifications, or deletions that are associated 
with cancer, and/or (ii) represent key genes affected by such chromosomal 
rearrangements, amplifications, or deletions and, therefore, play a key role in 

20 the initiation or progression of the disease. Genes within the signature sets 
that identify changes in the DNA copy number (based upon their changes in 
expression at the mRNA level) thus afford an entry point into other forms of 
diagnostic assay for the initiation, staging, or progression of cancer to be 
conducted in tissue samples at the DNA level (e.g. if gene X identifies a novel 

25 chromosomal amplification associated with cancer, then that specific 
chromosomal region defined by gene X would serve as the basis for a 
diagnostic assay for cancer, where genomic DNA is extracted from tissue 
samples and evaluated for the presence of the specific amplification), and 
also the rapid positional cloning of genes that play vital and direct roles in the 

30 initiation or progression of cancer. 
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In one embodiment of the present invention, said change in expression 
may be determined by determining a change in gene copy number, wherein 
said change in copy number is an increase in copy number or wherein said 
change in copy number is a decrease in copy number. 

5 

Such change in gene copy number may be determined by determining 
a change in expression of messenger RNA encoded by a particular gene 
sequence, especially where said sequence is one selected from the group 
consisting of the sequences of SEQ ID NO: 1 - 8447. Also in accordance with 

1 0 the present invention, said gene may be a cancer initiating gene, a cancer 
facilitating gene, or a cancer suppressing gene. In carrying out the methods of 
the present invention, a cancer facilitating gene is a gene that, while not 
directly initiating or suppressing tumor formation or growth, said gene acts, 
such as through the actions of its expression product, to direct, enhance, or 

1 5 otherwise facilitate the progress of the cancerous condition, including where 
such gene acts against genes, or gene expression products, that would 
otherwise have the effect of decreasing tumor formation and/or growth. 

Thus, the present invention also provides a process for diagnosing a 
20 cancerous cell comprising determining a cancer initiating, facilitating or 
suppressing gene according to the above-described methods. 

The present invention also relates to a process for treating cancer 
comprising inserting into a cancerous cell a gene construct comprising an 
25 anti-cancer gene operably linked to a promoter or enhancer element such that 
expression of said anti-cancer gene causes suppression of said cancer and 
wherein said promoter or enhancer element is a promoter or enhancer 
element modulating a gene sequence selected from the group consisting of 
the sequences of SEQ ID NO: 1 - 8447. 

30 

Thus, the signature sets or signature gene sets disclosed herein are 
useful in identifying genetic regulatory elements within the promoters of the 
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genes contained within the signature sets that are specific to normal tissue 
and/or the corresponding cancer. In accordance with this, each signature set 
is a collection of genes that share a gross common pattern of transcriptional 
regulation in cancer vs. normal (e.g. a signature set of genes that are 
5 transcriptionally up-regulated in cancer). 

In one such embodiment, analyzing and comparing the DNA 
sequences of the promoter regions of all the genes contained within the 
signature set serves to identify conserved stretches or motifs of sequences 
1 0 within subsets of genes that represent cis-acting elements that specifically 
drive a form of gene expression (e.g. increased transcriptional expression in 
cancer). The identification of such cis-acting regulatory elements is then 
available for use in driving the cancer-specific expression of suicide genes or 
toxins via genetic therapy using technology already well known in the art. 

15 

In separate embodiments, said anti-cancer gene is a cancer 
suppressor gene or encodes a polypeptide having anticancer activity, 
especially where said polypeptide has apoptotic activity. 

20 In additional embodiments, the present invention such insertion of said 

gene construct into a cancerous cell is accomplished in vivo, for example 
using a viral or plasmid vector. Such methods can also be applied to in vitro 
uses. Thus, the methods of the present invention are readily applicable to 
different forms of gene therapy, either where cells are genetically modified ex 

25 vivo and then administered to a host or where the gene modification is 
conducted in vivo using any of a number of suitable methods involving vectors 
especially suitable to such therapies, such as the use of special viral vectors, 
including adeno-associated viruses and adenoviruses, as well as retroviruses 
and specially constructed plasmids to accomplish such therapies. The use of 

30 these and other vectors is well known to those skilled in the art and need not 
be described further. 
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The present method also relates to a process for determining 
functionally related genes comprising contacting one or more gene sequences 
selected from the group consisting of the sequences of SEQ ID NO: 1 - 8447 
with an agent that modulates expression of more than one gene in such group 
5 and thereby determining a subset of genes of said group. 

In accordance with the present invention, said functionally related 
genes are genes modulating the same metabolic pathway or said genes are 
genes encoding functionally related polypeptides. In one such embodiment, 
10 said genes are genes whose expression is modulated by the same 
transcriptional activator or enhancer sequence, especially where said 
transcriptional activator or enhancer increases, or otherwise modulates, the 
activity of a gene sequence selected from the group consisting of SEQ ID NO: 
1 - 8447. 

15 

Thus, the signature gene set disclosed herein also find use as the 
basis for small molecule assays for therapeutics based upon changes in 
expression' profile. In one such embodiment, small molecule screens serve to 
identify changes in expression of genes within a signature set and thereby 
20 provide a tool for the identification of specific functional pathways and a 
means of assigning defined functions to novel genes. 

In accordance with the foregoing, monitoring the transcriptional 
expression of the genes contained within the signature sets disclosed herein 

25 forms the basis of an assay for small molecule therapeutics. For example, in 
situations where a signature set of genes that are transcriptionally up- 
regulated in cancer cells compared to normal cells, such screens facilitate the 
identification of small molecules that down-regulate the expression of the 
genes of the signature set within cancer cells. While such therapeutics make 

30 a cancer cell "look" more normal, based upon the expression of the genes 
within the signature set, what actually happens when such screens are put 
into practice is that all genes within the signature sets do not respond 
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identically to each small molecule within a chemical compound library. If an 
average signature set contains 200 different genes, for example, and the 
expression of all 200 genes is monitored in response to a library of some 
50,000 chemical compounds, and subsets of genes within the signature set 
5 consistently change their patterns of expression in response to particular 
chemicals (e.g., 10 of the genes always change expression in a coordinated 
way, such as down-regulation of one gene within the group of 10) then it 
always causes the down-regulation of the other 9 specific genes as well. 

10 Such subsets or subgroups of genes within each signature set that 

change their expression in a coordinated way in response to chemical 
compounds represent genes that are located within a common metabolic, 
signaling, physiological, or functional pathway so that by analyzing and 
identifying such subsets one can (a) assign known genes and novel genes to 

1 5 specific pathways and (b) identify specific functions and functional roles for 
novel genes that are grouped into pathways with genes for which their 
functions are already characterized or described. For example, one might 
identify a subgroup of 10 genes within a signature set (5 known genes & 5 
novel genes) that change expression in a coordinated fashion and for which 

20 the 5 known genes are involved in apoptosis thereby implicating the other 5 
novel genes as playing a role in apoptotic cellular processes. Therefore, the 
processes disclosed according to the present invention at once provide a 
novel means of assigning function to genes, i.e. a novel method of functional 
genomics, and a means for identifying chemical compounds that have 

25 potential therapeutic effects on specific cellular pathways. Such chemical 
compounds may have therapeutic relevance to a variety of diseases outside 
of cancer as well, in cases where such diseases are known or are 
demonstrated to involve the specific cellular pathway that is affected. 

30 It should be cautioned that, in carrying out the procedures of the 

present invention as disclosed herein, any reference to particular buffers, 
media, reagents, cells, culture conditions and the like are not intended to be 
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limiting, but are to be read so as to include all related materials that one of 
ordinary skill in the art would recognize as being of interest or value in the 
particular context in which that discussion is presented. For example, it is 
often possible to substitute one buffer system or culture medium for another 
5 and still achieve similar, if not identical, results. Those of skill in the art will 
have sufficient knowledge of such systems and methodologies so as to be 
able, without undue experimentation, to make substitutions that will optimally 
serve their purposes in using the methods and procedures disclosed herein. 

10 The present invention will now be further described by way of the 

following non-limiting example but it should be kept clearly in mind that other 
and different embodiments of the methods disclosed according to the present 
invention will no doubt suggest themselves to those of skill in the relevant art. 

15 

EXAMPLE 

SW480 cells are grown to a density of 10 5 cells/cm 2 in Leibovitz's L-15 
medium supplemented with 2 mM L-glutamine (90%) and 10% fetal bovine 

20 serum. The cells are collected after treatment with 0.25% trypsin, 0.02% 
EDTA at 37°C for 2 to 5 minutes. The trypsinized cells are then diluted with 30 
ml growth medium and plated at a density of 50,000 cells per well in a 96 well 
plate (200 jil/well). The following day, cells are treated with either compound 
buffer alone, or compound buffer containing a chemical agent to be tested, for 

25 24 hours. The medium is then removed, the cells lysed and the RNA 
recovered using the RNAeasy reagents and protocol obtained from Qiagen. 
RNA is quantitated and 10 ng of sample in 1 jnl are added to 24 |il of Taqman 
reaction mix containing 1X PCR buffer, RNAsin, reverse transcriptase, 
nucleoside triphosphates, amplitaq gold, tween 20, glycerol, bovine serum 

30 albumin (BSA) and specific PCR primers and probes for a reference gene 
(18S RNA) and a test gene (Gene X). Reverse transcription is then carried out 
at 48°C for 30 minutes. The sample is then applied to a Perlin Elmer 7700 
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sequence detector and heat denatured for 10 minutes at 95°C. Amplification 
is performed through 40 cycles using 15 seconds annealing at 60°C followed 
by a 60 second extension at 72°C and 30 second denaturation at 95°C. Data 
files are then captured and the data analyzed with the appropriate baseline 
5 windows and thresholds. 

The quantitative difference between the target and reference genes is 
then calculated and a relative expression value determined for all of the 
samples used. This procedure is then repeated for each of the target genes in 

10 a given signature, or characteristic, set and the relative expression ratios for 
each pair of genes is determined (i.e., a ratio of expression is determined for 
each target gene versus each of the other genes for which expression is 
measured, where each gene's absolute expression is determined relative to 
the reference gene for each compound, or chemical agent, to be screened). 

1 5 The samples are then scored and ranked according to the degree of alteration 
of the expression profile in the treated samples relative to the control. The 
overall expression of the set of genes relative to the controls, as modulated by 
one chemical agent relative to another, is also ascertained. Chemical agents 
having the most effect on a given gene, or set of genes, are considered the 

20 most anti-neoplastic. 

In carrying out the methods of the invention, it is to be expected that 
not all cells of a given sample of suspected cancerous cells will express all, or 
even most, of these genes but that a substantial expression thereof in a 

25 substantial number of such cells is sufficient to warrant a determination of a 
cancerous, or potentially cancerous, condition. The sequences disclosed 
herein are presented in numerical order from SEQ ID NO: 1 to 8447 although 
different genes are more or less relevant to different organs and tissues and 
some may be up-regulated in cancer and not normal cells while other are up- 

30 regulated in normal cells but not cancerous cells. The sequences presented 
herein may be genomic or cDNA sequences and may also be represented as 
RNA sequences. 
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WHAT IS CLAIMED IS: 

1. A process for screening for an antineoplastic agent comprising the 
steps of: 

5 (a) exposing cells to a chemical agent to be tested for antineoplastic 

activity, and 

(b) determining a change in expression of at least one gene that 
includes one of the sequences of SEQ ID NOS: 1 - 8447, or a sequence that 
is at least 95% identical thereto, 

10 

wherein a change in expression is indicative of anti-neoplastic activity. 

2. The process of claim 1 wherein expression is determined for more 
than one said gene. 

15 

3. The process of claim 1 wherein expression is determined for at least 
5 said genes. 

4. The process of claim 1 wherein expression is determined for at least 
20 10 said genes. 

5. The process of claim 1 wherein expression is determined for all said 
genes in a given signature gene set. 

25 6. A process for determining the cancerous status of a test cell, 

comprising determining expression in said test cell of at least one gene that 
includes one of the nucleotide sequences selected from the sequences of 
SEQ ID NOS: 1 - 8447, or a nucleotide sequence that is at least 95% 
identical thereto, and then comparing said expression to expression of said at 

30 least one gene in at least one cell known to be non-cancerous whereby a 
difference in said expression indicates that said cell is cancerous. 
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7. The process of claim 6 wherein said expression is the expression of 
more than one said gene. 

8. The process of claim 6 wherein said expression is the expression of 
5 at least 5 said genes. 

9. The process of claim 6 wherein said expression is the expression of 
at least 10 said genes. 

10 10. The process of claim 6 wherein said expression is the expression 

of all said genes. 

11. A process for determining a cancer initiating, facilitating or 
suppressing gene comprising the steps of contacting a cell with a cancer 

1 5 modulating agent and determining a change in expression of a gene selected 
from the group consisting of the gene sequences of SEQ ID NO: 1 - 8447 and 
thereby identifying said gene as being a cancer initiating or facilitating gene. 

12. The process of claim 11 wherein the gene determined by said 
20 process is an oncogene. 

13. The process of claim 11 wherein the gene determined by said 
process is a cancer facilitating gene. 

25 14. The process of claim 11 wherein the gene determined by said 

process is a cancer suppressor gene. 

15. The process of claim 11 wherein said cancer modulating agent has 
the effect of increasing gene expression. 
30 16. The process of claim 1 1 wherein said cancer modulating agent has 

the effect of decreasing gene expression. 
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17. A process for treating cancer comprising contacting a cancerous 
cell with an agent having activity against an expression product encoded by a 
gene sequence selected from the group consisting of SEQ ID NO: 1 - 8447. 

5 18. The process of claim 17 wherein said cancerous cell is contacted in 

vivo. 

19. The process of claim 17 wherein said agent comprises a portion 
having affinity for said expression product. 

10 

20. The process of claim 19 wherein said portion having affinity for said 
expression product is an antibody. 

21. The process of claim 17 wherein said agent is an apoptosis- 
1 5 inducing agent. 

22. A process for determining a cancer initiating, facilitating or 
suppressing gene in a cancer cell comprising determining a change in 
expression of a gene sequence selected from the group consisting of the 

20 sequences of SEQ ID NO: 1 - 8447. 

23. The process of claim 22 wherein said change in expression is 
determined by determining a change in gene copy number. 

25 24. The process of claim 23 wherein said change in copy number is an 

increase in copy number. 

25. The process of claim 23 wherein said change in copy number is a 
decrease in copy number. 
30 26. The process of 23 wherein said change in gene copy number is 

determined by determining a change in expression of messenger RNA 
encoded by said gene sequence. 
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28. The process of claim 23 wherein said gene is a cancer initiating 

gene. 

5 29. The process of claim 23 wherein said gene is a cancer facilitating 

gene. 

30. The process of claim 23 wherein said gene is a cancer suppressing 

gene. 

10 

31. A process for diagnosing a cancerous cell comprising determining 
a cancer initiating, facilitating or suppressing gene according to claim 23. 

32. A process for treating cancer comprising inserting into a cancerous 
1 5 cell a gene construct comprising an anti-cancer gene operably linked to a 

promoter or enhancer element such that expression of said anti-cancer gene 
causes suppression of said cancer and wherein said promoter or enhancer 
element is a promoter or enhancer element modulating a gene sequence 
selected from the group consisting of the sequences of SEQ ID NO: 1 - 8447. 

20 

33. The process of claim 32 wherein said anti-cancer gene is a cancer 
suppressor gene. 

34. The process of claim 32 wherein said anti-cancer gene encodes a 
25 polypeptide having anticancer activity. 

35. The process of claim 34 wherein said polypeptide has apoptotic 
activity. 

30 36. The process of claim 32 wherein said inserting into a cancerous 

cell is accomplished in vivo. 
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37. The process of claim 32 wherein said inserting into a cancerous 
cell further comprises use of a viral or plasmid agent and is accomplished 
either in vitro or in vivo. 

5 38. The process of claim 32 wherein said cancer is selected from the 

group consisting of colon cancer, breast cancer, stomach cancer, lung cancer, 
thyroid cancer, esophageal cancer, ovarian cancer, kidney cancer, prostate 
cancer and pancreatic cancer. 

1 0 39. The process of claim 32 wherein said cancer is of a type selected 

from the group consisting of adenocarcinoma, carcinoma, clear cell cancer, 
infiltrating ductal cancer, infiltrating lobular cancer, squamous cell carcinoma, 
neuroendocrine carcinoma, papillary carcinoma and Wilm's tumor. 

40. A process for determining functionally related genes comprising 
contacting one or more gene sequences selected from the group consisting of 
the sequences of SEQ ID NO: 1 - 8447 with an agent that modulates 
expression of more than one gene in such group and thereby determining a 
subset of genes of said group. 

41. The process of claim 40 wherein said functionally related genes are 
genes modulating the same metabolic pathway. 

42. The process of claim 40 wherein said genes are genes encoding 
25 functionally related polypeptides. 

43. The process of claim 40 wherein said all of genes are genes whose 
expression is modulated by the same transcription activator or enhancer 
sequence. 

30 

44. a method for producing a product comprising identifying an 
agent according to the process of claim 1 wherein said product is the 
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data collected with respect to said agent as a result of said process and* 
wherein said data is sufficient to convey the chemical structure and/or 
properties of said agent. 

5 45. A process for screening for an anti-neoplastic agent comprising the 

steps of: 

(a) exposing cells to a chemical agent to be tested for antineoplastic 
activity, and 

(b) determining a change in expression of at least one gene of a 
1 0 signature gene set, or a sequence that is at least 95% identical thereto, 

wherein a change in expression is indicative of anti-neoplastic activity. 

46. The process of claim 45 wherein said change in expression is an 
1 5 increase in expression. 

47. The process of claim 45 wherein said change in expression is a 
decrease in expression. 

20 48! The process of claim 45 wherein said change in expression is a 

change in expression of at least 5 genes of said signature gene set. 

49. The process of claim 45 wherein said change in expression is a 
change in expression of at least 10 genes of said signature gene set. 

25 

50. The process of claim 45 wherein said change in expression is a 
change in expression of at least half of the genes of said signature gene set. 
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